Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Text Zone Classification using Unsupervised Feature Learning

Identifieur interne : 000018 ( Main/Exploration ); précédent : 000017; suivant : 000019

Text Zone Classification using Unsupervised Feature Learning

Auteurs : Nibal Nayef [France] ; Jean-Marc Ogier [France]

Source :

RBID : Hal:hal-01319899

Abstract

Text zone classification is a vital step in the dig-itization process, without which OCR systems perform poorly.Prior methods to document zone classification have relied on largesets of hand-crafted features for training zone classifiers. Suchfeatures are usually database-dependent, and their computationis time consuming. In this work we propose a novel method fortext zone classification that relies on the approach of unsupervisedfeature learning. Within our method, feature vectors of documentzones are automatically learned by patches extraction, encodingand pooling, where feature encoding is based on a codebookof visual words. The training phase of the text classifier takesinto consideration the unbalance between text zones and non-text zones of all types. The proposed method has been tested onpublicly available standard databases, and achieved competitiveor better results compared to state-of-the-art methods. Theresults show that our approach matches well the task of textclassification, and is robust to zone shapes, orientations and size.

Url:


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Text Zone Classification using Unsupervised Feature Learning</title>
<author>
<name sortKey="Nayef, Nibal" sort="Nayef, Nibal" uniqKey="Nayef N" first="Nibal" last="Nayef">Nibal Nayef</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-40831" status="VALID">
<orgName>Laboratoire Informatique, Image et Interaction</orgName>
<orgName type="acronym">L3I</orgName>
<desc>
<address>
<addrLine>Bâtiment Pascal Avenue Michel Crépeau F-17042 La Rochelle Cedex 1</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-lr.fr/l3i</ref>
</desc>
<listRelation>
<relation name="EA2118" active="#struct-300311" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle name="EA2118" active="#struct-300311" type="direct">
<org type="institution" xml:id="struct-300311" status="VALID">
<orgName>Université de La Rochelle</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">La Rochelle</settlement>
<region type="region" nuts="2">Poitou-Charentes</region>
</placeName>
<orgName type="university">Université de La Rochelle</orgName>
</affiliation>
</author>
<author>
<name sortKey="Ogier, Jean Marc" sort="Ogier, Jean Marc" uniqKey="Ogier J" first="Jean-Marc" last="Ogier">Jean-Marc Ogier</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-40831" status="VALID">
<orgName>Laboratoire Informatique, Image et Interaction</orgName>
<orgName type="acronym">L3I</orgName>
<desc>
<address>
<addrLine>Bâtiment Pascal Avenue Michel Crépeau F-17042 La Rochelle Cedex 1</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-lr.fr/l3i</ref>
</desc>
<listRelation>
<relation name="EA2118" active="#struct-300311" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle name="EA2118" active="#struct-300311" type="direct">
<org type="institution" xml:id="struct-300311" status="VALID">
<orgName>Université de La Rochelle</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">La Rochelle</settlement>
<region type="region" nuts="2">Poitou-Charentes</region>
</placeName>
<orgName type="university">Université de La Rochelle</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">HAL</idno>
<idno type="RBID">Hal:hal-01319899</idno>
<idno type="halId">hal-01319899</idno>
<idno type="halUri">https://hal.archives-ouvertes.fr/hal-01319899</idno>
<idno type="url">https://hal.archives-ouvertes.fr/hal-01319899</idno>
<date when="2015-08-23">2015-08-23</date>
<idno type="wicri:Area/Hal/Corpus">000118</idno>
<idno type="wicri:Area/Hal/Curation">000118</idno>
<idno type="wicri:Area/Hal/Checkpoint">000006</idno>
<idno type="wicri:Area/Main/Merge">000018</idno>
<idno type="wicri:Area/Main/Curation">000018</idno>
<idno type="wicri:Area/Main/Exploration">000018</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Text Zone Classification using Unsupervised Feature Learning</title>
<author>
<name sortKey="Nayef, Nibal" sort="Nayef, Nibal" uniqKey="Nayef N" first="Nibal" last="Nayef">Nibal Nayef</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-40831" status="VALID">
<orgName>Laboratoire Informatique, Image et Interaction</orgName>
<orgName type="acronym">L3I</orgName>
<desc>
<address>
<addrLine>Bâtiment Pascal Avenue Michel Crépeau F-17042 La Rochelle Cedex 1</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-lr.fr/l3i</ref>
</desc>
<listRelation>
<relation name="EA2118" active="#struct-300311" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle name="EA2118" active="#struct-300311" type="direct">
<org type="institution" xml:id="struct-300311" status="VALID">
<orgName>Université de La Rochelle</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">La Rochelle</settlement>
<region type="region" nuts="2">Poitou-Charentes</region>
</placeName>
<orgName type="university">Université de La Rochelle</orgName>
</affiliation>
</author>
<author>
<name sortKey="Ogier, Jean Marc" sort="Ogier, Jean Marc" uniqKey="Ogier J" first="Jean-Marc" last="Ogier">Jean-Marc Ogier</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-40831" status="VALID">
<orgName>Laboratoire Informatique, Image et Interaction</orgName>
<orgName type="acronym">L3I</orgName>
<desc>
<address>
<addrLine>Bâtiment Pascal Avenue Michel Crépeau F-17042 La Rochelle Cedex 1</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-lr.fr/l3i</ref>
</desc>
<listRelation>
<relation name="EA2118" active="#struct-300311" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle name="EA2118" active="#struct-300311" type="direct">
<org type="institution" xml:id="struct-300311" status="VALID">
<orgName>Université de La Rochelle</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">La Rochelle</settlement>
<region type="region" nuts="2">Poitou-Charentes</region>
</placeName>
<orgName type="university">Université de La Rochelle</orgName>
</affiliation>
</author>
</analytic>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Text zone classification is a vital step in the dig-itization process, without which OCR systems perform poorly.Prior methods to document zone classification have relied on largesets of hand-crafted features for training zone classifiers. Suchfeatures are usually database-dependent, and their computationis time consuming. In this work we propose a novel method fortext zone classification that relies on the approach of unsupervisedfeature learning. Within our method, feature vectors of documentzones are automatically learned by patches extraction, encodingand pooling, where feature encoding is based on a codebookof visual words. The training phase of the text classifier takesinto consideration the unbalance between text zones and non-text zones of all types. The proposed method has been tested onpublicly available standard databases, and achieved competitiveor better results compared to state-of-the-art methods. Theresults show that our approach matches well the task of textclassification, and is robust to zone shapes, orientations and size.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>France</li>
</country>
<region>
<li>Poitou-Charentes</li>
</region>
<settlement>
<li>La Rochelle</li>
</settlement>
<orgName>
<li>Université de La Rochelle</li>
</orgName>
</list>
<tree>
<country name="France">
<region name="Poitou-Charentes">
<name sortKey="Nayef, Nibal" sort="Nayef, Nibal" uniqKey="Nayef N" first="Nibal" last="Nayef">Nibal Nayef</name>
</region>
<name sortKey="Ogier, Jean Marc" sort="Ogier, Jean Marc" uniqKey="Ogier J" first="Jean-Marc" last="Ogier">Jean-Marc Ogier</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000018 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000018 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Hal:hal-01319899
   |texte=   Text Zone Classification using Unsupervised Feature Learning
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024